220 research outputs found
Automatically Neutralizing Subjective Bias in Text
Texts like news, encyclopedias, and some social media strive for objectivity.
Yet bias in the form of inappropriate subjectivity - introducing attitudes via
framing, presupposing truth, and casting doubt - remains ubiquitous. This kind
of bias erodes our collective trust and fuels social conflict. To address this
issue, we introduce a novel testbed for natural language generation:
automatically bringing inappropriately subjective text into a neutral point of
view ("neutralizing" biased text). We also offer the first parallel corpus of
biased language. The corpus contains 180,000 sentence pairs and originates from
Wikipedia edits that removed various framings, presuppositions, and attitudes
from biased sentences. Last, we propose two strong encoder-decoder baselines
for the task. A straightforward yet opaque CONCURRENT system uses a BERT
encoder to identify subjective words as part of the generation process. An
interpretable and controllable MODULAR algorithm separates these steps, using
(1) a BERT-based classifier to identify problematic words and (2) a novel join
embedding through which the classifier can edit the hidden states of the
encoder. Large-scale human evaluation across four domains (encyclopedias, news
headlines, books, and political speeches) suggests that these algorithms are a
first step towards the automatic identification and reduction of bias.Comment: To appear at AAAI 202
CLIMB: Curriculum Learning for Infant-inspired Model Building
We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge. The challenge requires training a language model from scratch using only a relatively small training dataset of ten million words. We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model on linguistic evaluation tasks. In the vocabulary curriculum, we analyze methods for constraining the vocabulary in the early stages of training to simulate cognitively more plausible learning curves. In the data curriculum experiments, we vary the order of the training instances based on i) infant-inspired expectations and ii) the learning behavior of the model. In the objective curriculum, we explore different variations of combining the conventional masked language modeling task with a more coarse-grained word class prediction task to reinforce linguistic generalization capabilities. Our results did not yield consistent improvements over our own non-curriculum learning baseline across a range of linguistic benchmarks; however, we do find marginal gains on select tasks. Our analysis highlights key takeaways for specific combinations of tasks and settings which benefit from our proposed curricula. We moreover determine that careful selection of model architecture, and training hyper-parameters yield substantial improvements over the default baselines provided by the BabyLM challenge
CLIMB: Curriculum Learning for Infant-inspired Model Building
We describe our team's contribution to the STRICT-SMALL track of the BabyLM
Challenge. The challenge requires training a language model from scratch using
only a relatively small training dataset of ten million words. We experiment
with three variants of cognitively-motivated curriculum learning and analyze
their effect on the performance of the model on linguistic evaluation tasks. In
the vocabulary curriculum, we analyze methods for constraining the vocabulary
in the early stages of training to simulate cognitively more plausible learning
curves. In the data curriculum experiments, we vary the order of the training
instances based on i) infant-inspired expectations and ii) the learning
behavior of the model. In the objective curriculum, we explore different
variations of combining the conventional masked language modeling task with a
more coarse-grained word class prediction task to reinforce linguistic
generalization capabilities. Our results did not yield consistent improvements
over our own non-curriculum learning baseline across a range of linguistic
benchmarks; however, we do find marginal gains on select tasks. Our analysis
highlights key takeaways for specific combinations of tasks and settings which
benefit from our proposed curricula. We moreover determine that careful
selection of model architecture, and training hyper-parameters yield
substantial improvements over the default baselines provided by the BabyLM
challenge
Human adaptation of Ebola virus during the West African outbreak
The 2013â2016 outbreak of Ebola virus (EBOV) in West Africa was the largest recorded. It began following the cross-species transmission of EBOV from an animal reservoir, most likely bats, into humans, with phylogenetic analysis revealing the cocirculation of several viral lineages. We hypothesized that this prolonged human circulation led to genomic changes that increased viral transmissibility in humans. We generated a synthetic glycoprotein (GP) construct based on the earliest reported isolate and introduced amino acid substitutions that defined viral lineages. Mutant GPs were used to generate a panel of pseudoviruses, which were used to infect different human and bat cell lines. These data revealed that specific amino acid substitutions in the EBOV GP have increased tropism for human cells, while reducing tropism for bat cells. Such increased infectivity may have enhanced the ability of EBOV to transmit among humans and contributed to the wide geographic distribution of some viral lineages
Catching Element Formation In The Act
Gamma-ray astronomy explores the most energetic photons in nature to address
some of the most pressing puzzles in contemporary astrophysics. It encompasses
a wide range of objects and phenomena: stars, supernovae, novae, neutron stars,
stellar-mass black holes, nucleosynthesis, the interstellar medium, cosmic rays
and relativistic-particle acceleration, and the evolution of galaxies. MeV
gamma-rays provide a unique probe of nuclear processes in astronomy, directly
measuring radioactive decay, nuclear de-excitation, and positron annihilation.
The substantial information carried by gamma-ray photons allows us to see
deeper into these objects, the bulk of the power is often emitted at gamma-ray
energies, and radioactivity provides a natural physical clock that adds unique
information. New science will be driven by time-domain population studies at
gamma-ray energies. This science is enabled by next-generation gamma-ray
instruments with one to two orders of magnitude better sensitivity, larger sky
coverage, and faster cadence than all previous gamma-ray instruments. This
transformative capability permits: (a) the accurate identification of the
gamma-ray emitting objects and correlations with observations taken at other
wavelengths and with other messengers; (b) construction of new gamma-ray maps
of the Milky Way and other nearby galaxies where extended regions are
distinguished from point sources; and (c) considerable serendipitous science of
scarce events -- nearby neutron star mergers, for example. Advances in
technology push the performance of new gamma-ray instruments to address a wide
set of astrophysical questions.Comment: 14 pages including 3 figure
Trypanosoma brucei Glycogen Synthase Kinase-3, A Target for Anti-Trypanosomal Drug Development: A Public-Private Partnership to Identify Novel Leads
Over 60 million people in sub-Saharan Africa are at risk of infection with the parasite Trypanosoma brucei which causes Human African Trypanosomiasis (HAT), also known as sleeping sickness. The disease results in systemic and neurological disability to its victims. At present, only four drugs are available for treatment of HAT. However, these drugs are expensive, limited in efficacy and are severely toxic, hence the need to develop new therapies. Previously, the short TbruGSK-3 short has been validated as a potential target for developing new drugs against HAT. Because this enzyme has also been pursued as a drug target for other diseases, several inhibitors are available for screening against the parasite enzyme. Here we present the results of screening over 16,000 inhibitors of human GSK-3ÎČ (HsGSK-3) from the Pfizer compound collection against TbruGSK-3 short. The resulting active compounds were tested for selectivity versus HsGSK-3ÎČ and a panel of human kinases, as well as their ability to inhibit proliferation of the parasite in vitro. We have identified attractive compounds that now form potential starting points for drug discovery against HAT. This is an example of how a tripartite partnership involving pharmaceutical industries, academic institutions and non-government organisations such as WHO TDR, can stimulate research for neglected diseases
- âŠ